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1  INTRODUCTION 


The  human  genome  project  is  one  of  the  most  significant  activities  in 
contemporary  science.  It  has  had  a  profound  impact  on  evolutionary  biology, 
the  molecular  sciences,  and  biomedicine,  and  its  importance  will  increase  as 
the  project  approaches  completion.  The  stated  goal  of  the  project  is  to  ob¬ 
tain  the  complete  sequence  of  the  human  genome  by  the  year  2005.  The 
genome  contains  approximately  3.3  Gb  (billion  base  pairs).  As  of  September 
7,  1998,  0.20  Gb  (6.1%)  had  been  sequenced  and  deposited  in  the  sequence 
database.  The  current  pace  of  the  worldwide  sequencing  effort  is  about  0.1 
Gb  per  year.  Roughly  half  of  that  effort  is  supported  by  the  US  Government, 
at  an  anticipated  cost  of  $2.5  billion  over  the  lifetime  of  the  project.  Approx¬ 
imately  one-third  of  the  US  effort  is  supported  by  the  Department  of  Energy. 
The  DOE  program  includes  the  Joint  Genome  Institute  (JGI),  which  is  a  col¬ 
laborative  enterprise  involving  the  Lawrence  Berkeley,  Lawrence  Livermore, 
and  Los  Alamos  National  Laboratories. 

In  1997,  JASON  conducted  a  DOEksponsored  study  of  the  human  genome 
project,  with  special  emphasis  on  the  areas  of  technology,  quality  assurance 
and  quality  control,  and  informatics.  The  Report  of  that  study  (JASON 
Report  JSR-97-315)  is  available  from  the  JASON  Program  Office  and  on  the 
internet  at  http://www.ornl.gov/hgmis/publicat/miscpubs/  jason/.  A  brief 
synopsis  of  the  Report  was  published  earlier  this  year  in  Science  (Koonin, 
S.E.  279,  36-37,  1998).  The  present  study  has  two  aims:  first,  to  update  the 
1997  Report  in  light  of  recent  developments  in  genome  sequencing  technology, 
and  second,  to  consider  possible  roles  for  the  DOE  in  the  “post-genomic”  era, 
following  acquisition  of  the  complete  human  genome  sequence.  The  phrase 
“exploiting  the  genome”  refers  to  various  scientific  and  technical  approaches 
to  mining  the  burgeoning  database  of  genome  sequence  information  in  order 
to  understand  the  functional  consequences  of  that  information.  This  area  of 
research  often  is  referred  to  as  “functional  genomics” . 
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2  TECHNOLOGY 


The  1997  JASON  Report  made  the  following  specific  recommendations: 
Technology 


1.  The  DOE  should  play  a  leading  role  in  technology  development. 

2.  Improvements  should  be  sought  in  Sanger  dideoxy  sequencing  technol¬ 
ogy- 

3.  Funding  should  be  increased  for  alternative  sequencing  technologies  not 
based  on  gel  electrophoresis. 

4.  The  DOE  sequencing  centers  should  maintain  flexibility  with  regard  to 
incorporating  new  technologies. 


Quality  Assurance  and  Quality  Control 


5.  QA/QC  considerations  should  be  made  integral  to  the  human  genome 
project. 

6.  QA/QC  issues  should  be  treated  quantitatively. 

7.  Standardized  QA/QC  protocols  should  be  implemented  across  all  se¬ 
quencing  centers. 

Informatics 


8.  Let  the  “customers”  determine  what  informatics  tools  should  be  made 
.  available. 

9.  Encourage  standardization  of  data  formats  and  provide  translators  that 
converge  on  those  common  formats. 
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10.  Maintain  flexibility  with  regard  to  database  structure  and  query  oper¬ 
ations. 

A  major  recommendation  of  the  1997  JASON  Report  concerned  the 
need  for  greater  attention  and  increased  funding  in  the  area  of  technology  de¬ 
velopment.  Some  concern  was  expressed  as  to  whether  the  existing  sequenc¬ 
ing  technology  would  be  sufficient  to  complete  the  human  genome  project  by 
2005.  Even  if  that  goal  could  be  met,  clear  benefit  would  be  derived  from 
more  robust  and  higher-throughput  DNA  sequencing  methods. 

In  April,  1998  the  DOE  Oflice  of  Biological  and  Environmental  Research 
issued  a  call  for  proposals  in  the  area  of  genome  instrumentation  research 
(Program  Notice  98-16,  available  at  http;//www.er.doe.gov/production/grants 
/fr98_16.html).  That  Notice  solicited  new  applications  involving  “substan¬ 
tive  improvements  to  current  systems  and  novel  and  creative  new  strategies” 
for  genome  instrumentation.  A  total  of  $2  million  will  be  available  under 
this  Program  for  awards  to  be  made  in  FY  1999.  This  is  a  positive  step 
and  precisely  the  type  of  activity  that  is  needed  for  the  DOE  to  maintain  its 
leadership  role  in  technology  development.  However,  the  amount  of  funding 
is  only  about  2%  of  the  total  annual  DOE  budget  for  genome  sequencing 
and  may  not  be  sufficient  to  develop  and  critically  evaluate  new  technolo¬ 
gies.  It  may  be  appropriate  to  increase  funding  for  this  Program  in  FY  2000, 
depending  on  the  scope  and  quality  of  the  proposals  received  in  response  to 
Program  Notice  98-16. 

The  most  dramatic  development  in  human  genome  sequencing  over  the 
past  year  was  the  announcement  on  May  10,  1998  by  The  Institute  for  Ge¬ 
nomic  Research  (TIGR)  and  Perkin-Elmer  (PE)  of  their  plan  to  form  a  joint 
venture  to  determine  the  complete  human  sequence  by  the  year  2001.  This 
would  involve  the  formation  of  a  new  company,  later  named  Celera  Genomics, 
to  be  headed  by  TIGR  President  J.  Craig  Venter.  The  key  to  this  private 
initiative  is  a  set  of  enhanced  DNA  sequencing  technologies  centered  on  the 
new  PE  Applied  Biosystems  “ABI  PRISM  3700”  automated  DNA  sequencer. 
The  new  instrument  employs  capillary  gel  (versus  traditional  slab  gel)  elec¬ 
trophoresis  technology  and  is  designed  to  perform  960  sequence  reads  per  day. 
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The  instrument  utilizes  robotic  liquid  transfer  and  gel  loading  to  reduce  oper¬ 
ator  time  to  only  15  minutes  per  day.  In  contrast,  the  current-generation  ABI 
PRISM  377  instrument  performs  144  sequence  reads  and  requires  8  hours  of 
operator  time  per  day.  The  PRISM  3700  will  not  be  available  to  the  general 
scientific  community  until  Summer,  1999  and  is  projected  to  cost  $300,000, 
about  three  times  the  cost  of  a  PRISM  377,  but  is  expected  to  be  more  eco¬ 
nomical  to  operate  because  of  lower  labor  costs  and  reduced  consumption  of 
reagents. 

Other  companies  and  research  organizations  have  developed  DNA  se¬ 
quencers  based  on  capillary  gel  electrophoresis  technology.  Molecular  Dy¬ 
namics  offers  the  “MegaBACE  1000”,  which  has  been  commercially  avail¬ 
able  for  more  than  a  year.  Like  the  PRISM  3700,  this  instrument  performs 
~1,000  sequence  reads  per  day  and  utilizes  automated  sample  injection  to 
reduce  operator  time.  SpectruMedix  markets  the  “SCE9600” ,  a  capillary  gel 
electrophoresis  DNA  sequencer  that  incorporates  technologies  developed  at 
the  Ames  Research  Laboratories  through  funding  from  the  DOE.  Beckman 
Coulter  sells  the  “CEQ  2000”  DNA  sequencer,  which  utilizes  many  of  the 
innovations  in  capillary  gel  electrophoresis  that  first  were  developed  in  their 
analytical  capillary  electrophoresis  instruments. 

There  are  technological  enhancements  associated  with  the  new  capillary 
gel  electrophoresis  instruments,  some  that  can  be  applied  to  existing  slab- 
gel  electrophoresis  DNA  sequencers  and  others  that  will  carry  forward  to 
the  microchannel-based  devices  that  are  expected  to  succeed  capillary  gel 
electrophoresis  sequencers.  These  enhancements  include: 

•  Improved  dye-labeled  terminators.  Dideoxynucleotides  bearing 
an  attached  fluorescent  dye  are  used  to  indicate  which  of  the  four  nu¬ 
cleotides  occurs  at  a  particular  read  length.  Improved  dye  systems, 
such  as  the  PE  “Big-Dye”  dideoxynucleotides,  employ  fluorescence  res¬ 
onance  energy  transfer  to  couple  a  common  energy  donor  to  four  differ¬ 
ent  energy  acceptors,  each  having  a  distinct  wavelength.  This  provides 
equivalent  light  intensity  for  each  of  the  four  nucleotides,  thus  improv¬ 
ing  the  quality  of  the  sequence  reads. 
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•  Engineered  thermostable  polymerases.  A  DNA  polymerase,  prefer¬ 
ably  one  that  operates  at  high  temperature,  is  used  to  extend  a  DNA 
primer  in  the  presence  of  dideoxynucleotides.  Naturally-occurring  DNA 
polymerases  are  not  optimized  for  the  incorporation  of  dye-labeled 
dideoxynucleotides.  However,  variant  forms  of  these  enzymes  have  been 
developed  that  contain  specific  amino  acid  substitutions  that  improve 
the  enzyme’s  ability  to  accept  the  bulky  substrates. 

•  Automated  gel  loading.  Robotic  liquid  transfer  and  gel  loading 
increases  sequencing  throughput  while  reducing  operator  time.  This 
technology  is  useful  now  and  will  be  essential  for  future  generations  of 
ultra-high-throughput  automated  DNA  sequencers. 

•  Fixed-position  CCD.  Rather  than  scanning  across  a  slab  gel  while 
tracking  the  boundary  between  gel  lanes,  the  PRISM  3700  and  SCE9600 
instruments  utilize  a  fixed-position  2-dimensional  CCD  array  that  mon¬ 
itors  a  bundle  of  discrete  capillaries.  The  CCD  array  scans  in  the  spec¬ 
tral  dimension,  allowing  the  operator  to  choose  the  wavelengths  to  be 
monitored. 


None  of  the  new  capillary  gel  electrophoresis  instruments  involves  a  de¬ 
parture  from  Sanger  dideoxy  sequencing  methods.  However,  all  provide  a 
substantial  increase  in  DNA  sequencing  throughput  compared  to  current- 
generation  instruments,  especially  when  used  in  conjunction  with  the  tech¬ 
nological  enhancements  described  above.  This  higher  throughput  allows  one 
to  consider  alternative  sequencing  strategies  that  would  otherwise  be  im¬ 
practical,  for  example,  the  whole-genome  shotgun  sequencing  strategy  first 
proposed  by  Weber  and  Meyers  {Genome  Res.  7,  401-409,  1997)  and  now 
being  adopted  by  Celera. 

The  term  “shotgun”  refers  to  the  many  fragments  of  DNA  that  result 
when  the  input  DNA  is  broken  down  into  smaller  pieces  that  are  picked 
at  random.  These  fragments  are  cloned,  amplified,  isolated  individually, 
and  further  amplified  to  provide  material  for  sequencing.  The  aim  is  to 
generate  enough  clones  to  furnish  a  statistical  representation  of  the  entire 
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input  DNA,  embodied  within  a  set  of  overlapping  clones  whose  sequences 
can  be  assembled  computationally.  All  genome  sequencing  strategies  make 
use  of  shotgun  cloning  at  some  point.  The  standard  approach  within  the 
human  genome  project  has  been  to  divide  each  chromosome  into  a  complete 
set  of  either  large-sized  (80-300  kb)  or  medium-sized  (35-45  kb)  segments  of 
DNA,  which  are  then  used  as  the  input  for  shotgun  cloning  and  sequencing. 
This  approach  is  referred  to  as  “directed  sequencing” .  In  contrast,  Celera’s 
strategy  is  to  use  the  entire  human  genome  as  the  input  for  shotgun  cloning. 

Celera  proposes  to  employ  230  of  the  ABI  PRISM  3700  instruments  to 
perform  ~200,000  sequence  reads  per  day,  each  consisting  of  >500  base  pairs 
of  raw  sequence  data.  Most  of  these  sequences  will  be  derived  from  the  ends 
of  small  (~2  kb)  inserts  obtained  by  random  digestion  and  shotgun  cloning  of 
total  human  genomic  DNA  (for  more  information,  see  Venter  et  al,  Science 
280,  1540-1542,  1998).  A  smaller  number  of  sequences  will  be  obtained  from 
the  ends  of  larger  (~10  kb)  random  inserts,  and  a  still  smaller  number  from 
the  ends  of  very  large  (~150  kb)  inserts  within  a  library  of  bacterial  artificial 
chromosomes  (BACs).  Together,  these  will  provide  the  raw  data  for  assembly 
of  the  finished  sequence.  At  a  pace  of  200,000  sequence  reads  per  day,  only 
about  350  sequencing  days  would  be  required  to  obtain  10-fold  sequence 
coverage  of  the  entire  human  genome.  By  relating  the  raw  sequence  data  to 
physically  mapped  sequence  tagged  site  (STS)  and  expressed  sequence  tag 
(EST)  sequences,  it  should  be  possible  to  assemble  the  bulk  of  the  human 
genome  sequence. 

Unlike  the  directed  sequencing  strategy  being  implemented  in  a  dis¬ 
tributed  manner  by  the  JGI  and  other  large  sequencing  centers,  Celera  aims 
to  assemble  the  complete  human  genome  sequence  directly  from  a  total  of 
~70  million  end  sequences.  There  is  lively  debate  in  the  genomics  community 
regarding  the  feasibility  of  this  whole-genome  shotgun  sequencing  approach. 
An  important  concern  is  the  extent  to  which  the  completed  sequence  will 
contain  gaps,  and  therefore  not  be  truly  complete.  Celera  anticipates  that 
their  sequencing  strategy  will  leave  several  thousand  gaps,  although  this  may 
be  a  significant  underestimate.  Who  will  close  the  remaining  gaps?  A  related 
concern  is  the  extent  to  which  the  final  sequence  will  be  misassembled  due 
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to  the  incorrect  arrangement  of  the  component  sequence  segments.  This  will 
be  an  especially  serious  problem  within  regions  of  the  genome  that  contain 
highly  repetitive  sequences.  Celera’s  announcement  also  raises  a  crucial  ques¬ 
tion  of  how  members  of  the  federally-supported  genomics  community  should 
adjust  their  efforts  in  light  of  its  new  initiative. 

The  JGI  has  been  quick  to  respond  in  a  positive  and  pragmatic  man¬ 
ner  to  the  TIGR-PE  announcement.  The  DOE  had  already  helped  to  lay 
the  groundwork  for  Celera’s  sequencing  campaign  by  funding  work  at  TIGR 
and  the  University  of  Washington  to  sequence  the  ends  of  inserts  within  a 
library  of  BAG  clones.  These  data  will  provide  critical  signposts  for  fixing 
the  location  of  the  vast  number  of  end  sequences  that  will  be  obtained  from 
the  smaller  (2  and  10  kb)  inserts.  The  JGI  has  proposed  to  test  the  shotgun 
sequencing  strategy  at  the  level  of  individual  BAG  clones.  Each  BAG  clone 
of  100-200  kb  will  be  divided  into  a  set  of  smaller  (~3  kb)  plasmid  inserts 
and  the  ends  of  these  inserts  will  be  sequenced.  About  750  plasmids  will 
be  end-sequenced  to  provide  5-fold  sequence  coverage  of  the  parent  BAG. 
The  end  sequences  then  will  be  assembled  to  obtain  the  complete  BAG  se¬ 
quence.  This  strategy  will  yield  sequence  assemblies  that  are  of  sufficient 
size  to  encompass  a  typical  gene  and  will  do  so  more  rapidly  compared  to 
the  whole-genome  shotgun  sequencing  approach. 

The  following  recommendations  are  made  with  regard  to  genome  se¬ 
quencing  technology: 

1.  Capitalize  on  and  complement  Celera’s  efforts.  Gapillary  gel  elec¬ 
trophoresis  DNA  sequencers,  such  as  the  ABI  PRISM  3700,  together  with 
associated  technological  enhancements,  are  certain  to  provide  a  dramatic  in¬ 
crease  in  DNA  sequencing  throughput.  It  is  less  clear  how  difficult  it  will 
be  to  assemble  the  complete  genome  sequence  from  the  vast  number  of  end 
sequences  that  will  be  generated.  By  supporting  the  end-sequencing  of  BAG 
clones,  the  DOE  has  already  made  an  important  contribution  to  the  assembly 
process.  These  clones  may  also  be  useful  in  closing  many  of  the  gaps  left  by 
Gelera’s  total  shotgun  sequencing  approach.  Other  gaps,  especially  those  in 
regions  that  are  not  represented  within  BAG  clones,  will  require  more  labo- 
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rious  methods  to  achieve  closure.  There  will  be  little  incentive  for  a  private 
company  to  carry  the  gap-closing  process  through  to  the  end,  although  most 
biologists  believe  that  there  will  be  significant  value  in  closing  the  gaps.  It 
may  turn  out,  however,  that  the  most  intractable  gaps  are  not  worth  closing, 
provided  that  the  size  and  location  of  these  gaps  are  known.  In  any  case,  the 
JGI  should  be  prepared  to  close  those  gaps  that  are  worth  closing. 

2.  Proceed  with  BAG  shotgun  sequencing.  The  plan  to  test  the  large- 
scale  shotgun  sequencing  strategy  at  the  level  of  individual  BAG  clones  is  a 
good  one.  In  the  near  term,  this  will  assist  in  the  development  of  new  software 
tools  for  sequence  assembly  and  will  reveal  special  problems  that  pertain  to 
the  wholesale  assembly  of  human  genomic  sequence.  This  activity  offers 
the  best  opportunity  for  synergy  between  the  public  and  private  sequencing 
efforts.  In  the  long  term,  shotgun  sequencing  of  individual  BAG  clones  is 
likely  to  be  useful  in  closing  gaps  left  by  Celera’s  sequencing  efforts  and  may 
be  applicable  to  the  sequencing  of  other  eucaryotic  genomes. 

3.  Transition  to  capillary  gel  electrophoresis  sequencing.  Celera  will 
take  delivery  over  the  next  eight  months  of  230  of  the  new  PRISM  3700  se¬ 
quencers.  PE  Applied  Biosystems  expects  to  make  a  small  number  of  these 
instruments  available  to  other  large  customers  within  the  next  year.  At 
present,  the  JGI  is  largely  committed  to  the  current-generation  instruments 
and  is  employing  a  sequencing  strategy  that  would  not  benefit  substantially 
from  the  new  higher-throughput  machines.  This  must  change.  Capillary 
gel  electrophoresis  provides  a  significant  advance  that  will  be  followed  by  a 
series  of  incremental  improvements  over  the  next  few  years.  Some  of  these 
technologies  will  be  carried  forward  to  the  next  generation  of  microchannel- 
based  sequencing  devices.  The  JGI  should  keep  abreast  of  the  state  of  the 
art  in  commercial  DNA  sequencing  instrumentation  by  bringing  a  few  of 
the  commercially  available  capillary  gel  electrophoresis  based  sequencers  in 
house  at  the  earliest  opportunity.  This  will  allow  first-hand  evaluation  of  the 
performance  characteristics  of  these  instruments,  possibly  stimulate  changes 
in  the  JGI  sequencing  strategy,  and  promote  flexibility  in  considering  the  fu¬ 
ture  operation  of  the  sequencing  facilities.  As  the  JGI  becomes  more  active  in 
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the  shotgun  sequencing  of  BAG  clones,  the  high-throughput  instruments  will 
become  more  necessary. 

4.  Continue  advanced  technology  development  The  recent  call  for  pro¬ 
posals  in  the  area  of  genome  instrumentation  research  is  commendable.  It 
is  clear  that  technology  developments  over  the  past  year  have  dramatically 
changed  the  landscape  of  human  genome  sequencing.  There  appears  to  be 
no  limit  to  the  need  for  further  increases  in  DNA  sequencing  capacity.  The 
genomics  community  cannot  depend  on  a  single  company  or  single  type  of 
DNA  sequencing  methodology  to  ensure  ongoing  technological  advancement. 
Development  of  new  methods  of  sequencing,  not  based  on  gel  electrophore¬ 
sis,  should  be  supported.  More  so  than  other  agencies,  the  DOE  has  the 
obligation  and  capability  to  foster  long-term  technology  development  in  the 
area  of  DNA  sequencing. 
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3  FUNCTIONAL  GENOMICS 


A  year  ago  one  looked  expectantly  toward  the  “human  index  sequence” , 
the  complete  sequence  of  the  human  genome  that  would  be  available  by  2005. 
Although  there  was  uncertainty  as  to  whether  the  task  would  be  completed 
on  time,  surely  it  would  be  reached  within  a  few  years  later.  Now  there 
is  a  possibility  that  the  (nearly)  complete  sequence  of  the  human  genome 
will  be  in  hand  by  2001  or  shortly  thereafter.  There  is  a  plan  afoot  within 
the  JGI  and  the  genomics  community  to  produce  a  “draft”  sequence  of  the 
human  genome  by  2001,  focusing  initially  on  regions  that  are  well  mapped 
and  likely  to  be  of  biological  interest.  The  draft  sequence  would  contain 
numerous  sequence  gaps  but  few  physical  gaps,  and  would  serve  as  a  platform 
to  anchor  the  finished  sequence.  Pushing  the  notion  of  what  might  constitute 
a  draft  sequence,  Incyte  Pharmaceuticals  recently  announced  a  plan  to  map 
its  existing  database  of  EST  sequences  onto  the  physical  map  of  the  genome, 
declaring  the  result  to  be  the  first  complete  (though  privately  held)  picture  of 
the  human  genome.  The  question  of  what  constitutes  a  “complete”  genome 
sequence  is  open  to  scientific  and  philosophical  debate.  Regardless  of  one’s 
opinion,  it  is  clear  that  the  post-genomic  era  is  about  to  begin. 

The  ultimate  goal  of  exploiting  the  genome  is  to  understand  the  function 
of  each  of  the  roughly  100,000  human  genes,  both  individually  and  collec¬ 
tively,  and  to  understand  how  those  functions  differ  in  different  cell  states, 
cell  types,  individuals,  and  organisms.  Functional  genomics  is  most  mean¬ 
ingful  when  viewed  across  the  entire  genome.  This  is  because  understanding 
the  interactions  among  gene  products,  whether  to  form  structures,  execute 
regulatory  functions,  or  catalyze  chemical  reactions,  is  at  least  as  impor¬ 
tant  as  the  functions  of  the  gene  products  viewed  individually.  The  function 
of  a  gene  includes  its  physical  locus  within  the  genome,  the  primary  RNA 
transcript  that  derives  from  the  gene,  the  succession  of  RNAs  along  the  pro¬ 
cessing  pathway  to  a  mature  messenger  RNA  (mRNA),  the  protein  product 
that  results  from  translation  of  the  mRNA,  the  succession  of  chemical  and 
conformational  changes  that  the  protein  undergoes,  and  the  association  of  the 
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protein  into  large  multiprotein  “machines” .  It  also  is  important  to  consider 
the  substantial  portion  of  the  genome  that  does  not  encode  a  gene  product, 
but  contains  sequences  that  play  important  regulatory  roles  in  gene  expres¬ 
sion.  Functional  genomics  ultimately  involves  understanding  all  of  biology, 
a  task  that  will  keep  scientists  occupied  indefinitely. 

It  is  clear  that  priorities  must  be  set  in  exploring  the  endless  frontier 
of  functional  genomics.  The  JASON  study  does  not  address  scientific  prior¬ 
ities  at  the  level  of  investigator-initiated  research.  These  priorities  are  best 
established  by  members  of  the  biological  research  community  and  honed  by 
the  peer-review  process.  Rather,  the  study  considers  the  role  that  the  DOE 
might  play  to  facilitate  research  in  functional  genomics  at  both  academic 
institutions  and  the  national  laboratories.  The  following  recommendations 
are  made  with  regard  to  DOE  support  of  functional  genomics; 

1.  Provide  a  library  of  full-length  cDNA  clones.  A  substantial  database 
of  human  EST  sequences  exists  in  both  the  public  and  private  domains. 
Most  of  these  sequences  are  of  partial-length  complementary  DNA  (cDNA), 
typically  derived  from  the  3'-terminal  portion  of  an  expressed  mRNA.  It 
would  be  highly  beneficial  to  generate  an  inventory  of  full-length  cDNA  clones 
and  make  them  available  to  the  scientific  community.  These  clones  would  be 
useful  for  many  aspects  of  functional  genomics  research,  including  expression 
analysis,  mapping  of  cDNAs  onto  genomic  sequence,  and  the  preparation  of 
proteins  for  biophysical  studies.  By  analogy  to  the  DOE’s  highly  successful 
operation  of  synchrotron  facilities,  the  cDNA  clones  might  be  offered  as  a 
shared  resource  to  be  utilized  by  individual  investigators  for  varied  purposes. 

2.  Limit  funding  of  polymorphism  analysis.  The  sequence  of  the  hu¬ 
man  genome  is  described  not  only  by  the  “index  sequence”  but  also  by  se¬ 
quence  polymorphisms  that  distinguish  one  individual  from  another.  Single 
nucleotide  polymorphisms  (SNPs)  occur  at  an  overall  frequency  of  about  one 
per  thousand  nucleotides  and  at  a  much  greater  frequency  within  certain  re¬ 
gions  of  the  genome.  The  location,  identity,  and  population  distribution  of 
these  polymorphisms  has  considerable  biological  and  biomedical  significance. 
For  example,  SNPs  might  be  used  to  stratify  patient  populations  with  respect 
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to  the  treatment  of  disease.  Not  surprisingly,  there  is  tremendous  commercial 
interest  in  recognizing  important  SNPs  and  securing  the  associated  intellec¬ 
tual  property  rights.  The  academic  research  community  also  has  a  strong 
interest  in  SNPs,  most  of  this  work  falling  under  the  purview  of  the  National 
Institutes  of  Health.  This  is  a  fertile  area  for  research  and  it  already  is  re¬ 
ceiving  ample  attention  from  other  agencies.  DOE’s  special  strengths  lie  in 
other  areas  and,  in  our  view,  DOE  will  have  more  impact  by  directing  its 
limited  resources  to  other  areas  of  genome  science. 

3.  Move  aggressively  into  comparative  genomics.  Interpretation  of  the 
human  genome  sequence  requires  comparison  with  other  genomes,  especially 
those  of  closely-related  species.  Celera  has  announced  its  intention  to  se¬ 
quence  the  genome  of  the  fruit  fly,  as  a  “warm-up”  project  that  will  be  com¬ 
pleted  by  Spring,  1999.  It  appears  likely  that  they  will  attempt  to  sequence 
the  mouse  genome  as  well.  The  DOE  has  long  supported  comparative  stud¬ 
ies  in  mouse  and  human  genomics  and  maintains  a  valuable  genetic  resource 
within  its  Mouse  Source  program  at  Oak  Ridge  National  Laboratory.  The 
mouse  genome  will  be  especially  useful  in  identifying  genes  and  regulatory 
regions  within  the  human  genome.  The  DOE  should  assume  a  strong  role  in 
comparative  genome  sequencing.  This  is  an  excellent  use  for  the  sequencing 
capacity  that  will  be  available  following  completion  of  the  human  genome 
and  is  an  activity  more  closely  associated  with  studies  of  biological  diversity 
than  biomedical  applications.  The  DOE  already  has  established  its  place  in 
comparative  genomics  by  supporting  the  sequencing  of  archael  and  eubac- 
terial  genomes.  It  now  is  time  to  contemplate  a  broader  sweep  across  the 
tree  of  life.  In  considering  which  genomes  to  sequence  the  DOE  should  con¬ 
sult  with  individuals  who  have  expertise  in  both  eucaryotic  taxonomy  and 
molecular  aspects  of  gene  expression  (such  individuals  are  rare).  The  goal 
would  be  to  acquire  genome  sequences  at  well-chosen  phylogenetic  intervals, 
preferably  examining  pairs  of  closely-related  organisms  at  each  interval. 

4.  Assume  responsibility  for  database  management.  The  topic  of  se¬ 
quence  database  management  was  discussed  extensively  in  the  1997  JASON 
Report.  The  key  recommendation  in  this  area  was  that  the  DOE  should 
promote  community-wide  standards  for  software  operation  and  the  quality 
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of  the  entered  data.  It  was  suggested  that  database  operations  be  viewed  as 
a  service  offered  to  the  marketplace  of  potential  users  and  that  databases  be 
made  appropriate  to  the  needs  and  level  of  computational  sophistication  of 
the  users.  Similar  needs  apply  to  functional  genomics  databases,  although  in 
this  case  the  community  is  more  fragmented.  Functional  genomics  databases 
that  are  likely  to  have  broad  utility  include  listings  of  co-expressed  genes 
referenced  to  cell  type  and  cell  state,  indexes  of  protein-protein  interactions, 
and  a  compendium  of  solved  structural  domains.  As  with  DNA  sequence 
databases,  it  will  be  essential  to  develop  software  tools  that  perform  au¬ 
tomated  checks  on  the  quality  and  completeness  of  the  entered  data.  A 
modular  approach  is  needed  so  that  the  authoring  and  publishing  functions 
performed  by  individual  investigators  are  separated  from  the  cataloguing  and 
data  manipulation  functions  performed  by  database  curators.  This  will  al¬ 
low  investigators  to  focus  on  data  acquisition  in  the  face  of  changing  research 
methods  while  curators  focuses  on  data  management  in  the  face  of  changing 
computer  technology. 

5.  Foster  research  on  genome-wide  expression  analysis.  The  differ¬ 
ence  between  functional  genomics  and  traditional  efforts  in  biochemistry  and 
molecular  biology  is  that  the  former  is  viewed  from  a  genome- wide  perspec¬ 
tive.  Functional  genomics  demands  a  high  degree  of  parallelism  in  its  an¬ 
alytical  methods.  Examples  of  these  methods  include  sequence-comparison 
algorithms,  DNA  arrays,  protein  expression  arrays,  high-throughput  func¬ 
tional  assays,  and  targeted  gene  knockouts.  The  DOE  can  foster  progress  in 
functional  genomics  by  supporting  technology  development,  with  emphasis 
on  genome- wide  technologies. 
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4  SUMMARY 


Genomics  research  is  a  sure  bet.  This  will  be  one  of  the  most  prolific 
areas  of  scientific  investigation  over  the  next  two  decades.  The  human  in¬ 
dex  sequence  is  only  the  beginning;  it  will  be  followed  by  studies  involving 
polymorphism  analysis,  comparative  genomics,  and  functional  genomics  that 
will  revolutionize  scientific  understanding  of  biological  systems.  The  DOE 
is  playing  a  prominent  role  in  genomics  research  and  is  positioned  to  have 
a  major  impact  in  the  post-genomics  era.  Among  the  general  lessons  that 
emerged  from  this  and  the  previous  JASON  study  of  the  human  genome 
project  are  the  following:  1)  continue  to  support  advanced  technology  devel¬ 
opment,  especially  as  it  addresses  the  unquenchable  thirst  for  enhanced  DNA 
sequencing  throughput;  2)  be  nimble  in  adopting  new  technologies  as  they 
become  available;  3)  serve  the  community  by  providing  software  and  molec¬ 
ular  services  that  have  broad  application.  With  these  guiding  principles  and 
the  recognition  that  “big  science”  does  not  connote  monolithic  science,  it 
promises  to  be  an  exciting  road  ahead. 
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